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Abstract. This paper proposes a novel method for recommending books to 
pupils based on a framework called Edu-mining. One of the properties of the 
proposed method is that it uses only loan histories (pupil ID, book ID, date of 
loan) whereas the conventional methods require additional information such as 
taste information from a great number of users which is costly to obtain. To 
achieve this, the proposed method solves the book recommendation problem as 
a problem of loan date prediction, relying solely on loan histories. Experiments 
show that the proposed method achieves an accuracy of 60 % and outperforms 
the method (weighted slope open collaborative filtering) used for comparison. In 
addition to the performance, the proposed method has the following two 
advantages: (i) it is inexpensive compared to the conventional methods and (ii) 
reading level is adjustable. 


1 Introduction 

Reading is one of the most essential intellectual activities for pupils. Librarians in school 
libraries play an important role in facilitating the activity by recommending proper books 
to them. However, it is often the case that there are not enough librarians in school. 
Besides, it is impossible for a librarian to remember all the book information and the 
preferences of every single pupil in a school. 

To solve the problem, there has been work on automatic book recommendation. 
Whichbook [7] is a book recommendation system that helps the reader find books based 
on mood and style parameters, such as happy or sad , s/he specifies. A drawback of the 
system is that it requires collecting training data to tune mood and style parameters, 
which is costly and time-consuming. Besides, it is questionable whether or not pupils or 
even teachers can find proper books that have good educational effects by using the 
system since it has more than 100 parameter settings. 

Another way is collaborative filtering [1], Collaborative filtering is a general way of 
recommending items including books [3], movies [2], news articles [5], and music 
albums [6], However, collaborative filtering is also costly in that it normally requires 
collecting taste information from a number of users. In book recommendation, readers 
have to rate books they have read. This becomes especially problematic when readers are 
pupils as in our task. Pupils are not capable of rating books properly in some cases. As an 
example, imagine a situation where a teacher recommends a book that is a bit difficult for 
a pupil in order to give him/her a chance to read books of a higher reading level. S/he will 
be likely to rate the book as not- good or not-interesting because it is a bit difficult for 
him/her even if it has good effects on his/her intellectual development. 

In view of this background, this paper proposes a novel method for recommending books 
to pupils based on the Edu-mining framework [4] which favors inexpensive methods (See 
Nagata et al. [4] for the details of Edu-mining). The proposed method relies solely on 
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loan histories to recommend books. Here, a record of a loan history consists of triples: 
user ID (ID for a pupil), book ID (ID for a book that the pupil borrowed), and date of 
loan. A loan history consists of the triples for all books a pupil borrowed. Normally, loan 
histories are registered in a database in the school library. Thus, there is no need for 
collecting training data or taste information in the proposed method, which means that the 
proposed method is much more inexpensive than the conventional methods. 

Given loan histories, the proposed method solves the book recommendation problem as a 
problem of loan date prediction. In other words, it predicts when the target pupil will 
borrow the books that s/he has not borrowed yet from loan histories. This solution yields 
further two advantages. One is that the proposed method can estimate the reading levels 
of books it recommends. This means that reading levels are adjustable depending on the 
target pupil in the book recommendation of the proposed method. The other is that the 
preferences of pupils are implicitly included in the recommendation results. 

The rest of this paper is structured as follows. Section 2 discusses the basic concept of 
Edu-mining. Section 3 introduces the basic idea of the proposed method. Section 4 
describes the proposed method. Section 5 describes experiments conducted to evaluate 
the proposed method. Section 6 discusses the experimental results. 

2 Edu-mining 

Edu-mining [4] is based on data mining and text mining but differs from them in the 
following three points. First, the target data are peculiar to education. A good example of 
this is the writing of pupils where the distributions of words, mechanics, and style are 
quite different from those of adults. These differences affect the performance of tools 
such as a morphological analyzer that is often used in text mining to extract words, and 
thus they provide poor information on the data. Another example can be seen in our task. 
As already mentioned, collaborative filtering predicts user preferences based on previous 
user preferences (taste information). In the book recommendation for pupils, taste 
information is not as reliable as that of adults; pupils are not capable of rating books 
properly in some cases as exemplified in Section 1 . These facts imply that Edu-mining 
has to solve the problems that arise from the differences between educational data and 
normal data in its own scheme. 

Second, it prefers simple and inexpensive techniques. It should be implemented at 
moderate cost since it mainly aims at the use in school. Also, the target users of Edu- 
mining are mainly teachers and/or students (including pupils). If the used techniques are 
simple, the target users are likely to use them easily. Besides, they may sometimes be 
able to give feedback on the techniques. 

Third and finally, whereas data/text mining aims at improving the quality of the mined 
knowledge, it is not necessarily the case in Edu-mining; its ultimate goal is to achieve 
good educational outcomes. In case of normal book recommendation, the ultimate goal is 
to find and recommend books that the user wishes to read (and probably to purchase). By 
contrast, in case of our task, this is not the ultimate goal; the ultimate goal is to facilitate 
their intellectual development by recommending proper books. 
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This is the basic concept of Edu-mining. The next section describes the basic idea of the 
proposed method based on Edu-mining. 

3 Basic Idea 

So far, we have seen the basic concept of Edu-mining and its relation to book 
recommendation for pupils. This section describes the basic idea of the proposed method 
based on Edu-mining. In book recommendation for pupils, the peculiarity of the data is 
that taste information obtained from pupils may be unreliable as Section 1 describes. The 
proposed method overcomes the problem by not using taste information. Instead, it solves 
the book recommendation problem as a problem of loan date prediction. It uses a simple 
and intuitive way to predict loan dates. 

Before describing the basic idea of the proposed method, let us introduce a new loan date 
called absolute loan date. Loan date normally has the form of date, month, and grade of 
the pupil (e.g., 1st Sep. 1st grade). This form of loan date is not suitable for the 
calculation used in the proposed method as we will see below. So, absolute loan date is 
used instead of the normal loan date. Absolute loan date is a simple mapping of the 
normal loan date. The first day of the first grade is the base date and mapped to 0. Other 
loan dates are simply mapped to the corresponding absolute loan dates of which distance 
from the base date is given by the number of days from the first day of the first grade. For 
example, a month later from the first day is mapped to 30 (or 31), the first day of the 
second grade is mapped to 365, and so on. Figure 1 illustrates the mapping between 
normal loan dates and absolute loan dates. 
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Figure 1. Mapping between normal loan date and absolute loan date 

Here, it is worthwhile to note that absolute loan dates roughly correspond to reading 
levels. Namely, first grade pupils tend to borrow books of low reading levels whereas 
upper grade pupils tend to borrow books of higher reading levels. This implies that if one 
can predict absolute loan dates, s/he can also estimate reading level. This is why reading 
level is adjustable in the recommendation of the proposed method. 

Now let us describe the basic idea of the proposed method. The proposed method solves 
the book recommendation problem as a problem of loan date prediction as already 
mentioned. This is equivalent to saying that the proposed method predicts absolute loan 
dates from loan histories. Once it predicts absolute loan dates, it can easily recommend 
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books to the target pupil because it knows when s/he will borrow the books s/he has not 
borrowed yet. Simply, it recommends books which are predicted to be borrowed at the 
day of the recommendation or near the day. Or, if one wishes to recommend a book of a 
higher reading level, it can recommend books which are predicted to be borrowed some 
days (say, a half year later) after the day of the recommendation; the opposite can also be 
done. 

To see how the proposed method predicts absolute loan dates, suppose that we have loan 
histories shown in Figure 2 where loan dates are expressed by absolute loan dates. Figure 
2 shows, for example, that pupil A borrowed book A on the absolute date 365 
(equivalently, the first day of the second grade). Further suppose that we are predicting 
the absolute loan date of book B for pupil A (the question mark in Figure 2 denotes that 
pupil A has not borrowed book B yet). If we look at the loan history of pupil B , we will 
notice that s/he borrowed book B 370 days after book A. Based on this, it is natural to 
predict the absolute loan date of book B for pupil A to be 370 (= 670 — 300) days after 
the loan date 365 of book A for pupil A, or equally 735 (= (365 + (670 — 300)}). 
Similarly, based on the loan history of pupil C, it is natural to predict the absolute loan 
date of book B for pupil A to be 725 (= (365 + (710 — 350)}). To obtain the final 
prediction, we take the average of the two absolute loan dates, that is, (735 + 725)/2 
= 730 (equivalently, the first day of the third grade). It should be noted that adding the 
average of the differences between the loan dates to the loan date of the base book gives 
the same result. For instance, 730 = 365 + {(670 - 300) + (710 - 350) }/2. 



Figure 2. Example of loan histories 

Although the loan histories in Figure 2 involves only three pupils and two books for 
illustration purpose, actual loan histories often involves far more pupils and books. 
Therefore, the average is taken over the relevant books and the relevant pupils in actual 
use. A rough definition of relevant pupils and relevant books is as follows (the next 
section will describe the strict definition). A relevant pupil is those who have borrowed 
the following two books: (a) one of the books the target pupil borrowed and (b) the book 
of which absolute loan history is to be predicted. A relevant book is the book that is 
borrowed by (i) the target pupil and (ii) one or more of the relevant pupils. 
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This is the basic idea of how the proposed method predicts absolute loan dates from loan 
histories. The next section describes the prediction method in detail. 


4 Proposed Method 


To formalize the prediction method, we will use the symbol p and b to denote a pupil and 
a book, respectively, in the given loan histories. We will also use the symbol d pb to 
denote the absolute loan date when the pupil p borrowed the book b\ if the pupil p has 
not borrowed the book b yet, then d pb is set to —1. 

Now, let p be the target pupil (target for book recommendation) and b be the book of 
which absolute loan date is to be predicted. Then, a relevant pupil is those who have 
borrowed both b and one of the books the target pupil borrowed. Thus, a set of relevant 
pupils is defined by 

P(p,b ') = {p’\d p i b =£ — 1, d p i b i =£ — l} (1) 

where b’ denotes one of the books the target pupil borrowed. Also, a relevant book is a 
book that satisfies the following two conditions: (i) a book that the target pupil borrowed, 
and (ii) a book of which relevant pupil exists 1 . Using Equation (1), a set of relevant 
books is defined by 

B(p,b ) = [b'\d pb i * -1, \P(b,b')\ > 0} . (2) 


Using Equation (1) and Equation (2), absolute loan dates are predicted by 



Z b /eB(p,b) Z p ,eP(b,b'){ d p,b' + ( d p',b- d p',b'V 
T*i ) rEB(p,b)\P(.b’b , )\ 


( 3 ) 


Here, (d p y + (d p > b — d p > b >)} corresponds to the simple prediction of absolute loan 
dates discussed in the basic idea in Section 2 (for instance, (365 + (670 — 300)}). The 
sums in the numerator are the total sum of the simple predictions over the relevant pupils 
and the relevant books; in the case of the same example, the sums correspond to 725 + 
730. The denominator is the number of simple predictions. Hence, Equation (3) gives the 
average of the simple predictions. In case of \P(b, b')\ =0 for any b ' , d p b is set to —1 
meaning that the proposed method cannot predict the absolute loan date. 


Intuitively, books whose absolute loan date is given by Equation (3) are similar, in terms 
of the topic, to the books that the target pupil borrowed because the average is taken over 
the relevant books and relevant pupils; the average is taken over the relevant pupils who 
have borrowed some of the same books as the target pupil and over the books that the 
relevant pupils have borrowed. In other words, the book preferences of the target pupil 
are implicitly included in the prediction through the relevant pupils and the relevant 


1 Note that a relevant book is not a book borrowed by the target pupil and anyone who borrowed the book b. 
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books. Furthermore, the similarity of each relevant book is considered in Equation (3). 
This can be seen by noting that Equation (3) can be rewritten as 

~ _ ^b'eB(p,b){l P ( d ' fe ')l d p,b' + S p /eP(b,b')( d p',b- d p',b')| 

P ' b ~ ' 

In the rewritten version of Equation (3), the base date d p b '( the first term in the 
numerator) is weighed by the factor \P(b, b') | which denotes the number of pupils that 
borrowed both b and b' . It is reasonable to think that the more pupils borrow two books, 
the more similar the two are, and in turn it is reasonable to give a higher weight to such a 
pair in the prediction. Equation (3) exactly does this. 

Also, it should be noted that the denominator can be regarded as the credibility of the 
prediction because it denotes the number of relevant pupils and relevant books involved 
in the prediction. The prediction is not reliable if it is made based on few relevant pupils 
and few relevant books. Considering this, predictions whose £ b /EB(p,&)|P(b, b') \ < N 
where N denotes a certain threshold are discarded in the book recommendation. 

Once absolute loan dates are predicted for the books that the target pupil has not 
borrowed yet, the proposed method recommends books to the target pupil as follows. It 
recommends M books which are predicted to be borrowed at the day of the 
recommendation or near the day; here (M = 5 or M — 10, for example). Or, if a teacher 
wishes to recommend (or the target pupil wishes to read) books of a higher reading level, 
it recommends books which are predicted to be borrowed some days after the day of the 
recommendation. If one wishes the opposite, it recommends books which are predicted to 
be borrowed some days before the day of the recommendation. The amount of days can 
be chosen by an intuitive way to specify reading level. Recall that absolute loan date is 
simply the one to one mapping of normal loan date. If one sets the amount to 365 days 
after, it corresponds to specifying a one-grade-higher reading level. 

5 Evaluation 

For evaluation, we collected loan histories of pupils in an elementary school where the 
grades range from first to sixth. Table 1 shows the statistics on the loan histories. 

Table 1. Statistics on the loan histories used for evaluation 


Term of collection 

Number of pupils 

Total number of loaned books 

30th Aug. 2007 to 1st Sep. 2008 

619 

14383 


In the evaluation, we conducted two experiments. In the first, we evaluate how accurately 
the proposed method can recommend books similar to the books that the target pupil 
borrowed, which is described in 4.1. In the second, we evaluate the capability of the 
proposed method in estimating reading level, which is described in 4.2. 
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5.1 Experiment on Book Recommendation Accuracy 

The experimental conditions and procedures are as follow. First, we randomly selected 10 
target pupils (two for each grade, from first to fifth grade) from the loan histories; pupils 
in sixth grade ware not included in the experiment because of the limitation of the 
proposed method which will be discussed in Section 5. Second, we predicted absolute 
loan dates for the target pupils using the proposed method; the threshold N, which was 
discussed in Section 3, was set to five. Third, we selected five most difficult books and 
five easiest books for each pupil according to the predicted loan date. Then, the 10 books 
were shown to two elementary school teachers together with the corresponding loan 
history. Finally, the two teachers separately rated each book as similar (to one or more of 
the books in the loan history in terms of its topic), not-related, or unknown referring to 
the corresponding loan history. The performance of the proposed method was measured 
by accuracy. Accuracy was defined by 

total number of books rated as related 
total number of books 

For comparison, we implemented the weighted slope one collaborative filtering [2], 
which had been shown to be effective in item recommendation. To fully implement the 
weighted collaborative filtering, we need taste information for each book as described in 
Section 1. However, normal loan histories such as the ones used in this evaluation, do not 
contain taste information. For this reason, we implemented the weighted collaborative 
filtering with the loan histories in which an equal rating was given to all books. Doing so, 
it can recommend related books but cannot rank recommended books; all books are 
equally favored. So, 10 books were randomly chosen from the recommended books and 
shown to the two teachers for evaluation. The performance was measure by accuracy as 
in the proposed method. 

Table 2 shows the results. It shows that the proposed method achieves an accuracy of 
0.600. This means that on average, six out of the 10 books recommended by the proposed 
method are related to the books that the target pupil borrowed. It seems to be not so 
difficult for teachers or even pupils to select related books from the recommended books 
which are actually related 60% of the time. 

Table 2. Evaluation on book recommendation accuracy 


Method 

Accuracy 

Proposed method 

0.600 

Slope one collaborative filtering 

0.435 


Table 2 also shows that the proposed method outperforms the weighted slope one 
collaborative filtering. Indeed, the difference between the two is significant (normal 
approximation to the binomial test, p < 0.01). The performance of the weighted slope 
one collaborative filtering implies that its recommendation may confuse teachers and 
pupils because more than half of the recommended books are not relevant. 
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5.2 Experiment on Reading Level Estimation 

The experimental conditions and procedures are as follow. First, we made five pairs of 
books by randomly selecting a book from the five most difficult books and a book from 
the five easiest books which the proposed method recommended to each target pupil (50 
pairs in total). Second, we randomly labeled the two books in each pair as A and B. Third, 
four human raters (undergraduate students) separately determined which book in each 
pair was more difficult by referring to a book search system that retrieves book 
information including the title, the author(s), the number of pages, the picture(s) (if 
available), the reading level (if available), and the synopsis (if available). They separately 
gave each pair either +1, —1, or 0 meaning A is more difficult , B is more is difficult, and 
indistinguishable, respectively. Then, we merged the results. If the sum is equal to or 
greater than 3, then A is determined to be more difficult. Similarly, the sum is equal to or 
smaller than —3, then B is determined to be more difficult. If the sum is —2 or2, the first 
and second authors joined the four human raters in the evaluation giving newly the pair 
+1, —1, or 0. If the new sum is equal to or greater (smaller) than 3 (—3), then A (B) is 
determined to be more difficult; otherwise, indistinguishable. Also, if the sum is between 
— 1 and + 1, the pair is determined to be indistinguishable. 

As the results, 34 out of 50 pairs were distinguishable in terms of the reading level. For 
the 34 pairs, the predictions of the proposed method agreed with the decisions of the 
human raters 62% of the time (21 Out of 34 pairs). Although the results show that the 
predictions of the proposed method roughly agree with the decisions of the human raters, 
the agreement is not as high as we expected. We will discuss the reason in the next 
section. 

6 Discussion 

The evaluation has shown that the proposed method is effective in recommending books 
related to the books that the target pupil borrowed. The reason is that the proposed 
method predicts absolute loan dates from the relevant books and the relevant pupils. 

The effects can be seen in the results of the recommendation. The proposed method is 
capable of recommending books in series as Table 3 shows. As underlined, the proposed 
method recommended Astronomical observation 1, 4, and 9 to the pupil who borrowed 
Astronomical observation 8. Information about books in series is useful for 
recommendation since teachers or book database systems do not necessarily have the 
information. More importantly, the results show that the proposed method is effective in 
recommending related books. For instance, it recommended Constellation observation 1, 
which is highly related to Astronomical obserx’ation 8, The wonder of the Earth, The birth 
of the great telescope Subaru, and Journey in the space. Table 4 shows another example. 

By contrast, the performance of the proposed method concerning reading level is not as 
high as we expected. As already described in the previous section, the differences in 
reading level were indistinguishable in 32% of the 50 pairs. For the rest, the predictions 
of the proposed method agreed with those of the human raters 62% of the time. One of 
the major reasons is that we used loan histories whose term is one year in the evaluation 
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(or should we say we could only collect that amount?). This means that the difference in 
reading level is a one-grade higher or lower at most and often much less than one-grade. 
This explains why 32% of the 50 pairs were indistinguishable in terms of reading level. 
Considering this, the proposed method will improve in the reading level prediction with 
longer term loan histories. Another reason is related to the problem of evaluation. It is not 
so easy to accurately evaluate reading level. It was sometimes difficult for the human 
raters to determine which book was more difficult by only referring to the book search 
system. It is possible to take another way of evaluation, which will be our future work. 

Table 3. Example of book recommendation (books in series) 


Borrowed Books 


Recommended Books 


Astronomical observation 8: Sun and stars 
Let’s do experiments about air 
Wonderful science for pupils 10 
Journey to the West. vol. 1 
The wonder of the Earth 
The birth of the great telescope Subaru 
Questions about earthquakes 1 
Experiments about the light and the sight 
Science games 

Journey in the space 


Astronomical observation 1: Spring constellation 
Astronomical observation 4: Winter constellation 
Astronomical observation 9: Earth, moon, and planets 
Constellation observation 1: Find spring constellations 
Wanamuke the Witch 
Seton’s Wild Animals 8 
Football 
Brother Bear 

The great maze of Triceratops 

Zorori and a mysterious, magical girl 


Table 4. Example of book recommendation (related books) 


Borrowed Books 


Recommended Books 


Wonders of bats 
Stag beetle 


Kon and Aki 

Th e wonder of col d 

Invisible man Samgury in the Zokuzoku village 

Th e wonder of bra in 

Big feet of Mr. alligator 

Q u est ions f o r bod y fu notions 

Swallowtail 

Morning glory 

Ladybug 

Oh, Grandma! 

The .wonder _9_f_teeth an d mouth 

Deko-chan 

Firefly 

Cabbage white butterfly 


Mantis 

Bat 

Mickey mouse: The sorcerer’s apprentice 
Kenta and rabbit 

Zorori: The great operation of ghost 
What and what_quest[o_ns from_ 100 pupils in first grade 
Penguin patrol party 
Mrs. Cat’s guest 


This section has discussed the effectiveness of the proposed method in book 
recommendation. Here, it is also worthwhile to discuss the limitations of the proposed 
method. One of the limitations is that the proposed method is not effective in 
recommending books to pupils during the last days of school. It is often impossible to 
take the differences of absolute loan dates in Equation (3) because there are no pupils in 
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higher grades. This is why we excluded pupils in sixth grade from the evaluation. 

Another limitation is that the proposed method is not capable of recommending books 
that have never been borrowed; other systems based collaborative filtering have the same 
limitation. By contrast, teachers or especially librarians can properly recommend such 
books. It requires other techniques to achieve this. 

7 Conclusions 

This paper proposed a novel method for book recommendation based on Edu-mining. It 
has three advantages over the conventional methods: (i) it is inexpensive, (ii) it can 
recommend books related to the books that the target pupil borrowed, and (iii) reading 
level is adjustable. The evaluation reveals that the proposed method achieves an accuracy 
of 60% in recommending related books and outperforms the weighted slope open 
collaborative filtering. The evaluation also reveals that the reading level predicted by the 
proposed method roughly agrees with the reading level determined by human raters. 

For future work, we will investigate how the prediction of reading level can be evaluated 
more accurately. We will also investigate how other sources of information can be used 
to improve the proposed method. 
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